8 research outputs found

    On the detection of SOurce COde re-use

    Full text link
    © {Owner/Author | ACM} {2014}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation, http://dx.doi.org/10.1145/2824864.2824878"This paper summarizes the goals, organization and results of the first SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. The detection of source code re-use is an important research field for both software industry and academia fields. Accordingly, PAN@FIRE track, named SOurce COde Re-use (SOCO) focused on the detection of re-used source codes in C/C++ and Java programming languages. Participant systems were asked to annotate several source codes whether or not they represent cases of source code re-use. In total five teams submitted 17 runs. The training set consisted of annotations made by several experts, a feature which turns the SOCO 2014 collection in a useful data set for future evaluations and, at the same time, it establishes a standard evaluation framework for future research works on the posed shared task.PAN@FIRE (SOCO) has been organised in the framework of WIQ-EI (EC IRSES grantn. 269180) and DIANA-APPLICATIONS (TIN2012-38603-C02- 01) research projects. The work of the last author was supported by CONACyT Mexico Project Grant CB-2010/153315, and SEP-PROMEP UAM-PTC-380/48510349.Flores Sáez, E.; Rosso, P.; Moreno Boronat, LA.; Villatoro-Tello, E. (2014). On the detection of SOurce COde re-use. En FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation. ACM. 21-30. https://doi.org/10.1145/2824864.2824878S2130C. Arwin and S. Tahaghoghi. Plagiarism detection across programming languages. Proceedings of the 29th Australian Computer Science Conference, Australian Computer Society, 48:277--286, 2006.N. Baer and R. Zeidman. Measuring whitespace pattern sequence as an indication of plagiarism. Journal of Software Engineering and Applications, 5(4):249--254, 2012.M. Chilowicz, E. Duris, and G. Roussel. Syntax tree fingerprinting for source code similarity detection. In Program Comprehension, 2009. ICPC '09. IEEE 17th International Conference on, pages 243--247, 2009.D. Chuda, P. Navrat, B. Kovacova, and P. Humay. The issue of (software) plagiarism: A student view. Education, IEEE Transactions on, 55(1):22--28, 2012.G. Cosma and M. Joy. Evaluating the performance of lsa for source-code plagiarism detection. Informatica, 36(4):409--424, 2013.B. Cui, J. Li, T. Guo, J. Wang, and D. Ma. Code comparison system based on abstract syntax tree. In Broadband Network and Multimedia Technology (IC-BNMT), 3rd IEEE International Conference on, pages 668--673, Oct 2010.J. A. W. Faidhi and S. K. Robinson. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ., 11(1):11--19, Jan. 1987.Fire, editor. FIRE 2014 Working Notes. Sixth International Workshop of the Forum for Information Retrieval Evaluation, Bangalore, India, 5--7 December, 2014.J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.E. Flores, A. Barrón-Cedeño, L. Moreno, and P. Rosso. Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education, pages n/a--n/a, 2014.E. Flores, A. Barrón-Cedeño, P. Rosso, and L. Moreno. DeSoCoRe: Detecting source code re-use across programming languages. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session, NAACL-HLT, pages 1--4. Association for Computational Linguistics, 2012.E. Flores, A. Barrón-Cedeño, P. Rosso, and L. Moreno. Towards the Detection of Cross-Language Source Code Reuse. Proceedings of 16th International Conference on Applications of Natural Language to Information Systems, NLDB-2011, Springer-Verlag, LNCS(6716), pages 250--253, 2011.E. Flores, M. Ibarra-Romero, L. Moreno, G. Sidorov, and P. Rosso. Modelos de recuperación de información basados en n-gramas aplicados a la reutilización de código fuente. In Proc. 3rd Spanish Conf. on Information Retrieval, pages 185--188, 2014.D. Ganguly and G. J. Jones. Dcu@ fire-2014: an information retrieval approach for source code plagiarism detection. In Fire [8].R. García-Hernández and Y. Lendeneva. Identification of similar source codes based on longest common substrings. In Fire [8].M. Joy and M. Luck. Plagiarism in programming assignments. Education, IEEE Transactions on, 42(2):129--133, May 1999.A. Marcus, A. Sergeyev, V. Rajlich, and J. Maletic. An information retrieval approach to concept location in source code. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on, pages 214--223, Nov 2004.S. Narayanan and S. Simi. Source code plagiarism detection and performance analysis using fingerprint based distance measure method. In Proc. of 7th International Conference on Computer Science Education, ICCSE '12, pages 1065--1068, July 2012.M. Potthast, M. Hagen, A. Beyer, M. Busse, M. Tippmann, P. Rosso, and B. Stein. Overview of the 6th international competition on plagiarism detection. In L. Cappellato, N. Ferro, M. Halvey, and W. Kraaij, editors, Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014., volume 1180 of CEUR Workshop Proceedings, pages 845--876. CEUR-WS.org, 2014.L. Prechelt, G. Malpohl, and M. Philippsen. Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science, 8(11):1016--1038, 2002.I. Rahal and C. Wielga. Source code plagiarism detection using biological string similarity algorithms. Journal of Information & Knowledge Management, 13(3), 2014.A. Ramírez-de-la Cruz, G. Ramírez-de-la Rosa, C. Sánchez-Sánchez, W. A. Luna-Ramírez, H. Jiménez-Salazar, and C. Rodríguez-Lucatero. Uam@soco 2014: Detection of source code reuse by means of combining different types of representations. In Fire [8].F. Rosales, A. García, S. Rodríguez, J. L. Pedraza, R. Méndez, and M. M. Nieto. Detection of plagiarism in programming assignments. IEEE Transactions on Education, 51(2):174--183, 2008.K. Sparck and C. van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. British Library Research and Development Report, 5266, University of Cambridge, 1975.G. Whale. Software metrics and plagiarism detection. Journal of Systems and Software, 13(2):131--138, 1990

    Towards the detection of cross-language source code reuse

    Full text link
    Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; Barrón Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., García, A., Rodríguez, S., Pedraza, J.L., Méndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009

    PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use

    Full text link
    © Owner/Author This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM, In Proceedings of the Forum for Information Retrieval Evaluation. FIRE/ 14. http://dx.doi.org/10.1145/2824864.2824878This paper summarizes the goals, organization and results of the first SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. The detection of source code re-use is an important research field for both software industry and academia fields. Accordingly, PAN@FIRE task, named SOurce COde Re-use (SOCO); focused on the detection of re-used source codes in C/C++ and Java programming languages. Participant systems were asked to annotate several source codes whether or not they represent cases of source code re-use. In total three teams participated and submitted 13 runs. The training set consisted of annotations made by several experts, a feature which turns the SOCO 2014 collection in a useful data set for future evaluations and, at the same time, it establishes a standard evaluation framework for future research works.PAN@FIRE (SOCO) has been organised in the framework of WIQ-EI (ECIRSES grant n. 269180) and DIANA-APPLICATIONS (TIN2012-38603-C02-01) research projects. The work of the last author was supported by CONACyT Mexico Project Grant CB-2010/153315, and SEP-PROMEP UAM-PTC-380/48510349.Flores Sáez, E.; Rosso, P.; Moreno Boronat, LA.; Villatoro-Tello, E. (2014). PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use. ACM. http://hdl.handle.net/10251/66414

    Cross-language source code re-use detection using latent semantic analysis

    Full text link
    [EN] Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional pproaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.This work was partially supported by Universitat Polit`ecnica de Val`encia, WIQ-EI (IRSES grant n. 269180), and DIANA-APPLICATIONS (TIN2012- 38603-C02- 01) project. The work of the fourth author is also supported by VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Flores Sáez, E.; Barrón-Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Cross-language source code re-use detection using latent semantic analysis. Journal of Universal Computer Science. 21(13):1708-1725. https://doi.org/10.3217/jucs-021-13-1708S17081725211

    A low-power RF front-end for 2.5 GHz receivers

    Get PDF
    © 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a low power and low cost front end for a direct conversion 2.5 GHz ISM band receiver composed of a 16 kV HBM ESD protected LNA, differential Gilbert-cell mixers, and high-pass filters for DC offset cancellation. The whole front-end is implemented in a 2P6M 0.18 µm RFCMOS process. It exhibits a voltage gain of 24dB and a SSB noise figure of 8.4dB which make it suitable for most of the 2.5 GHz wireless short-range communication transceivers. The achieved power consumption is only 1.06mW from a 1.2V power supply.Peer ReviewedPostprint (published version

    IARG-AnCora: Annotating AnCora corpus with implicit arguments

    Full text link
    [EN] Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to obtain a deeper understanding of the text to make inferences on the highest level in order to obtain qualitative improvements in the results.[ES] IARG-AnCora tiene como objetivo la anotación con papeles temáticos de los argumentos implícitos de las nominalizaciones deverbales en el corpus AnCora. Estos corpus servirán de base para los sistemas de etiquetado automático de roles semánticos basados en técnicas de aprendizaje automático. Los analizadores semánticos son componentes básicos en las aplicaciones actuales de las tecnologías del lenguaje, en las que se quiere potenciar una comprensión más profunda del texto para realizar inferencias de más alto nivel y obtener así mejoras cualitativas en los resultados.Acción complementaria (FFI2011-13737-E), asociada al proyecto TextMess 2.0 (TIN2009-13391-C04-03/04).Taulé Delor, M.; Peris, A.; Martí Antonín, MA.; Moreno Boronat, LA.; Rodríguez, H.; Moreda, P. (2012). IARG-AnCora: Anotación de los corpus AnCora con argumentos implícitos. PROCESAMIENTO DEL LENGUAJE NATURAL. 49:181-184. http://hdl.handle.net/10251/29863S1811844

    MALLBA: A library of skeletons for combinatorial optimisation

    Get PDF
    The mallba project tackles the resolution of combinatorial optimization problems using algorithmic skeletons implemented in C++. MALLBA offers three families of generic resolution methods: exact,heuristic and hybrid. Moreover, for each resolution method, MALLBA provides three different implementations: sequential, parallel for local area networks, and parallel for wide area networks (currently under development). This paper shows the architecture of the mallba library, presents some of its skeletons and offers several computational results to show the viability of the approach

    MALLBA: a library of skeletons for combinatorial optimisation

    No full text
    The MALLBA project tackles the resolution of combinatorial optimization problems using algorithmic skeletons implemented in C++. MALLBA offers three families of generic resolution methods: exact, heuristic and hybrid. Moreover, for each resolution method, MALLBA provides three different implementations: sequential, parallel for local area networks, and parallel for wide area networks (currently under development). This paper shows the architecture of the MALLBA library, presents some of its skeletons and offers several computational results to show the viability of the approach
    corecore